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Abstract:  We  describe  an  algorithm  for  the  direct  solution  of  sys¬ 
tems  of  linear  algebraic  equations  associated  with  the  discretiza¬ 
tion  of  boundary  integral  equations  with  non-oscillatory  kernels 
in  two  dimensions.  The  algorithm  is  “fast”  in  the  sense  that  its 
asymptotic  complexity  is  0(N  log*  N),  where  N  is  the  number  of 
nodes  in  the  discretization,  and  k  depends  on  the  kernel  and  the 
geometry  of  the  contour  (k  —  1  or  2).  Unlike  previous  fast  tech¬ 
niques  based  on  iterative  solvers,  the  present  algorithm  directly 
constructs  a  sparse  factorization  of  the  inverse  of  the  matrix;  thus 
it  is  suitable  for  problems  involving  relatively  ill-conditioned  ma¬ 
trices,  and  is  particularly  efficient  in  situations  involving  multiple 
right  hand  sides.  The  performance  of  the  scheme  is  illustrated 
with  several  numerical  examples. 

1.  Introduction 

Boundary  value  problems  of  classical  potential  theory  are  ubiquitous  in  engineer¬ 
ing  and  physics.  Most  such  problems  can  be  reduced  to  boundary  integral  equations 
which  are,  from  a  mathematically  point  of  view,  more  tractable  than  the  original 
differential  equations.  Although  the  mathematical  benefits  of  such  reformulations 
were  realized  and  exploited  in  the  19th  century,  until  recently  boundary  integral 
equations  were  rarely  used  as  a  numerical  tool,  since  most  integral  equations  upon 
discretization  turn  into  dense  matrices.  In  the  1980’s,  the  cost  of  applying  dense 
matrices  resulting  from  potential  theory  to  arbitrary  vectors  was  greatly  reduced 
by  the  development  of  “fast”  algorithms  (Fast  Multipole  Methods,  panel  clustering, 
wavelets,  etc.).  Combining  fast  matrix- vector  multiplication  techniques  with  itera¬ 
tive  schemes  for  the  solution  of  large-scale  systems  of  linear  algebraic  equations,  it 
became  possible  to  solve  well-conditioned  boundary  integral  equations  of  potential 
theory  in  0(n)  operations.  Today,  such  combinations  are  in  many  environments  the 
fastest  and  most  accurate  numerical  solution  techniques  available.  Iterative  linear 
solvers  have  certain  drawbacks  though;  we  briefly  discuss  these  below. 

(1)  The  number  of  iterations  required  by  an  iterative  solver  is  sensitive  to  the 
spectral  properties  of  the  matrix  of  the  system  to  be  solved;  for  sufficiently 
ill-conditioned  problems,  the  number  of  iterations  is  proportional  to  n.  Since 
each  iteration  (with  FMM  acceleration)  requires  0(n)  operations,  the  total 
operation  count  is  proportional  to  n2.  While  this  is  still  better  than  the 
0(n3)  estimate  associated  with  direct  solvers,  in  many  situations  0(n2)  is 
not  acceptable. 
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(2)  When  one  needs  to  solve  a  collection  of  problems  involving  a  single  matrix 
and  multiple  right-hand  sides,  the  CPU  time  requirements  of  most  iterative 
algorithms  are  close  to  the  time  required  to  solve  one  problem  multiplied  by 
the  number  of  problems  to  be  solved.  With  most  direct  solvers,  the  situation 
is  different;  once  the  matrix  has  been  factored,  applying  its  inverse  to  each 
additional  right-hand  side  costs  very  little. 

(3)  When  a  collection  of  linear  systems  have  to  be  solved  whose  matrices  are  in 
some  sense  “close”  to  each  other,  iterative  algorithms  derive  very  little  (if 
any)  advantage  from  the  closeness  of  the  matrices. 

(4)  Most  direct  schemes  for  the  solution  of  linear  systems  are  closely  related  to 
efficient  algorithms  for  the  construction  of  their  Singular  Value  Decomposi¬ 
tions  and  certain  other  matrix  factorizations  (L-R,  Q-R,  etc.).  The  simplest 
such  scheme  is  probably  the  inverse  power  method  with  shifts  (see,  for  ex¬ 
ample,  [5]),  which  converts  any  algorithm  for  the  solution  of  a  linear  system 
into  an  algorithm  for  the  determination  of  a  prescribed  singular  value.  It¬ 
erative  techniques  do  not  provide  such  a  capability,  except  via  the  so-called 
Lanczos  schemes,  which  tend  to  be  quite  inefficient  (see,  for  example,  [12]). 


The  subject  of  this  paper  is  a  numerical  technique  that  is  intended  to  overcome 
these  shortcomings  by  directly  producing  a  sparse  factorization  of  the  inverse  of 
the  matrix.  When  applied  to  contour  integral  equations  of  potential  theory  whose 
kernels  are  non-oscillatory,  the  asymptotic  complexity  of  the  solver  is  0{n  log*  n), 
where  k  depends  on  the  geometry  and  the  kernel  (k  =  1  or  2).  When  applied 
to  problems  involving  oscillatory  kernels,  the  asymptotic  complexity  deteriorates 
as  the  wave-number  increases  but  the  scheme  remains  viable  up  to  objects  about  a 
thousand  wavelengths  in  size.  The  factorization  technique  described  in  this  paper  is  a 
multilevel  extension  of  the  compression  technique  described  in  [10].  The  machinery 
underlying  these  techniques  applies  generally  to  matrices  with  rank-deficient  off- 
diagonal  submatrices;  contour  integral  equations  have  been  chosen  by  the  authors 
simply  as  the  most  straightforward  application. 

It  is  not  the  purpose  of  this  paper  to  provide  an  exhaustive  survey  of  the  literature 
on  the  subject  we  are  addressing.  A  number  of  researchers  have  observed  that 
matrices  with  rank-deficient  off-diagonal  blocks  admit  “fast”  factorizations  (see  [7], 
[8]);  others  have  constructed  “fast”  algorithms  in  various  environments  (see  [1],  [2], 
[3],  [4])  where  the  operators  in  question  posses  rank-deficient  off-diagonal  blocks, 
without  using  this  property  explicitly.  However,  we  observe  that  the  algorithm  of  this 
paper  is  closely  related  to  the  scheme  described  in  [11].  In  fact,  our  algorithm  could 
be  viewed  as  a  modification  of  the  algorithm  of  [11]  that  replaces  “elongated”  objects 
in  two  or  three  dimensions  with  “curves” ,  extends  the  class  of  kernels  addressed  by 
[11],  and  introduces  modifications  in  the  scheme  of  [11]  that  are  necessary  for  this 
extension  to  work. 

The  paper  is  organized  as  follows:  In  Section  2  we  introduce  our  notation  and  list 
certain  facts  about  compression  of  rank-deficient  matrices.  In  Section  3  we  demon¬ 
strate  that  the  inverse  of  a  matrix  with  rank-deficient  off-diagonal  blocks  possesses 
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a  sparse  hierarchical  factorization.  In  Section  4  we  present  a  generic  numerical  tech¬ 
nique  for  constructing  the  factorization  described  in  Section  3.  In  Section  5  we  show 
how  the  generic  numerical  technique  presented  in  Section  4  can  be  improved  fur¬ 
ther  when  applied  to  contour  integral  equations.  In  Section  6  we  illustrate  through 
numerical  examples  the  efficiency  of  the  technique  presented  in  Section  5  when  ap¬ 
plied  to  a  number  of  different  kernels  and  contours.  In  Section  7  we  summarize  our 
findings  and  discuss  possible  extensions  and  generalizations. 


2.  Preliminaries 


2.1.  Notation.  Throughout  the  paper,  we  use  upper  case  letters  for  matrices  and 
lower  case  letters  for  vectors  and  scalars.  The  canonical  unit  vectors  in  C”  are 
denoted  by  ed.  Given  a  matrix  X  €  CTnxn,  we  let 

X*  denote  its  adjoint  (the  complex  conjugate  transpose), 

c Tk{X )  denote  its  fc-th  singular  value, 

||X||2  denote  its  l2  operator  norm, 

IP^Hf  denote  its  Erobenius  norm, 

xf  £  Clxn  denote  its  i-th  row  and 
Xj  £  Cmxl  denote  its  j-th  column. 

Given  matrices  A,  B,  C  and  D  we  let 


(2.1) 


[AB\, 


A 

C  ’ 


and 


A  B 
C  D  ’ 


denote  larger  matrices  obtained  by  stringing  the  blocks  A,  B,  C  and  D  together. 


Definition  1  (Permutations  vectors).  Given  a  positive  integer  n,  we  define 

(2.2)  J„  =  the  set  of  permutations  of  the  integers  {1, . . .  ,n}. 

Given  two  integers  k  and  n  such  that  1  <  k  <  n,  we  define 

(2.3)  jjk  =  the  set  of  subsets  of  size  k  of  elements  of  J„. 

In  other  words,  if  J  £  l£,  then  J  is  a  vector  of  integers 

(2.4)  J  =  [ji, . . . ,  jfc], 
where  1  <  ji  <  n  and  all  ji's  are  different. 


Definition  2  (Submatrix).  When  we  use  the  term  “submatrix”  we  do  not  insist  that 
the  submatrix  must  form  a  contiguous  block.  To  be  precise,  we  say  that  B  €  Ckxl 
is  a  submatrix  of  A  £  Cmxn,  if  there  exist  permutations  I  —  [ii, . . .  ,ifc]  €  J ^  and 
J  =  [ji,  •  •  ■ ,  ji]  G  li  such  that 

(2.5)  bpq  =  aipjqi  p=l,...,k,  q  =  l,...,l. 


Definition  3  (Neutered  rows  and  columns).  Let  A  be  a  matrix  consisting  of  p  x  p 
blocks, 


(2.6) 


A  = 


A^p)  ■ 

A&p) 


4 


We  refer  to  the  submatrix  formed  by  all  blocks  on  the  z-th  row  except  the  diagonal 
one,  i.e. 

(2.7)  ^(m+i) . . . 

as  the  z-th  neutered  row  of  blocks.  A  neutered  column  of  blocks  is  defined  analo¬ 
gously. 

2.2.  Compression  of  matrices.  In  this  section  we  state  a  theorem  on  matrix  com¬ 
pression  that  forms  the  foundation  of  the  matrix  factorization  technique  presented 
later  in  this  paper.  Roughly  speaking,  the  theorem  asserts  that  given  a  matrix  A  of 
rank  A:,  it  is  possible  to  pick  k  of  its  columns  that  form  a  well-conditioned  basis  for 
the  remaining  columns.  It  was  first  reported  in  slightly  different  form  in  [6]. 

Theorem  1.  Given  an  arbitrary  matrix  A  E  Cmx"  and  an  integer  k  such  that 
1  <  k  <  min(m,n),  there  exists  a  (not  necessarily  unique)  matrix  T  €  Ckx^n~k^  and 
a  permutation  J  =  [ji, . . . ,  jn]  £  J„  such  that 

(2.8)  A2  =  A{T  +  E. 

Here,  A\  and  A2  are  matrices  formed  by  the  columns  of  A, 

(29)  ^  =  [«*,-•.  «i,  ]£C-Xl, 

^  =  [<W 

the  elements  of  the  matrix  T  £  Cfcx("-fc)  satisfy 
(2.10)  \Tij\  <  1,  for  l  <  i  <  k,  1  <  j  <  n  —  k, 

and  the  matrix  E  E  Cmx("~fc)  satisfies  the  inequality 
(2-11)  \\E\\2<crk+1(AWl  +  k(n-k), 

where  <Jk+i(A)  is  the  (k  +  l)-th  singular  value  of  A. 

Remark  4  (Computational  complexity.).  While  Theorem  1  asserts  the  theoretical 
existence  of  a  matrix  T  and  a  permutation  J  with  certain  properties,  it  does  not 
address  the  question  of  how  to  determine  these  numerically.  In  fact,  the  authors  are 
not  aware  of  any  algorithm  that  finds  these  objects  in  polynomial  time.  However,  in 
[6]  an  algorithm  is  presented  that  finds  a  matrix  T  and  a  permutation  J  such  that 
all  statements  of  Theorem  1  still  hold,  except  that  (2.10)  and  (2.11)  are  replaced  by 
the  weaker  inequalities 

(2.12)  \Tij\  <  \fn,  for  1  <i<k,  1  <  j  <  n  —  k, 
and 

(2.13)  ||£||2  <  <rfc+1(A) \/ 1  +  nk(n  -  k). 

When  m>n,  the  computational  complexity  of  this  algorithm  is  typically  0(mnk), 
the  same  as  for  the  pivoted  QR-factorization.  In  rare  cases,  the  computational 
complexity  may  be  somewhat  larger  but  it  never  exceeds  0(mn2). 
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Observation  5  (Column  compression).  When  applied  to  a  matrix  A  £  Cmx”  of 
rank  k,  Theorem  1  asserts  that  there  exists  a  well-conditioned  column  operation 
that  leaves  k  of  the  columns  of  A  unchanged  while  mapping  the  remaining  n  —  k 
columns  to  zero.  More  specifically,  let  us  define 

(2.14)  R  =  Pj  l 

where  T  and  J  are  defined  in  Theorem  1  and  the  permutation  matrix  Pj  is  defined 
by 

(2.15)  Pj  =  [eh,...,ejn}eCn*n. 

Then 

(2.16)  AR=  [ACS0]  e€mxn, 

where  the  “column  skeleton”  Acsi  is  formed  by  k  of  the  columns  of  A\ 

(2.17)  Acs  =  [an,...,ajk\eCmxk. 

Moreover,  by  virtue  of  (2.10)  and  the  identity 

(2.18)  R-'=  [T  PI 

it  is  clear  that 

(2.19)  ||J?||f  <  y/n  +  k(n  -  k),  and  | |i?—1 1 |p  <  y/n  +  k(n  —  k ). 

Observation  6  (Row  compression).  The  argument  of  Observation  5  can  equally 
well  be  applied  to  the  rows  of  a  matrix  A  e  Cmxn  of  rank  k.  Thus,  there  exists  a 
matrix  L  £  RmXm  such  that 

(2.20)  LA=  eCmxn, 

where  the  “row  skeleton”  Ars  £  Cfexn  is  formed  by  k  of  the  rows  of  A  and 

(2.21)  ||T||f  <  y/m  +  k(m  —  k),  and  ||L_1||f  <  y/m  +  k(m  —  k). 

3.  Analytical  apparatus 
Consider  a  p  x  p  block  matrix 

'  4(11)  . . .  4(1  P) 

(3.1)  A=  :  : 

4(pi)  . . .  4(pp) 

such  that  any  neutered  row  or  column  of  blocks  is  rank-deficient.  In  this  section 
we  derive  compressed  factorizations  of  the  inverse  of  such  a  matrix.  Lemmas  2  and 
3  provide  factorizations  for  the  case  p  =  2.  Observation  8  extends  the  results  of 
Lemma  3  to  a  general  p.  Observation  9  introduces  hierarchical  factorizations  that 
further  improve  the  efficiency. 

Lemma  2  below  asserts  that  for  a  given  2x2  block  matrix  with  rank-deficient 
off-diagonal  blocks,  there  exist  well-conditioned  row-  and  column-operations  that  (i) 
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introduce  zeros  in  the  off-diagonal  blocks  and  (ii)  leave  the  remaining  elements  in 
the  off-diagonal  blocks  untouched. 


Lemma  2.  Let  A  be  a  non-singular  matrix 

-  4(H)  4(12)  ‘ 

4(21)  4(22) 


where  G  Cnxn,  4(22)  g  £mxm  and  ^ e  0ffdiagonal  blocks  A W  €  CnXm,  AW  G 
Cmxn  have  rank  k  <  min (m,n).  Then  there  exist  matrices  R,  L  €  Cnxn  such  that 


(3.3) 


'10' 

01 

4(11)  4(12) 
4(21)  4(22) 


R  0 
0  I 


Xn 

Xn 

A 12) 
ars 

X21 

x22 

0 

4(21) 

^cs 

0 

4(22) 

Here,  the  matrix  G  Ckxm  consists  of  k  of  the  rows  of  AW)  and  the  matrix 

4?  €  Cmxk  consists  of  k  of  the  columns  of  A^21\  Moreover,  Xu  G  Ckxk,  X12  G 
Mfcx(n-fc);  X2i  e  R(n-fe)xfc;  Xa2  €  R(n-fc)x(n-fc);  fl„d  the  matrices  R  and  L  satisfy 

(2.19)  and  (2.21),  respectively. 


Proof:  Due  to  Observations  5  and  6,  there  exist  matrices  R,L  G  <Cnxn  such  that 


(3.4) 


LAW  = 


4(12) 

1RS 

0 


and  AWr  =  j4g)  0]  , 


where  4^  and  4s  ^  are  submatrices  of  AW  and  A^21\  respectively.  The  identity 
(3.3)  now  follows  by  partitioning 


(3.5) 


R  =  [Ri  R2] , 


where  Li  G  Ckxn,  L2  G  dn-fc>xn, 
where  Rx  G  Cnxk,  R2  G  Cnx^~k\ 


and  setting 

Xu  =  LiAWRi  e  Ckxk , 


(3.6) 


X12  =  LxAWr2  e  cfcx("-fc), 

X21  =  L2aWR  1  €  c("-*)xfc, 

X22  =  L2AWr2  g  C(n-fc)x(n-fc)< 


□ 

The  following  lemma  uses  the  results  of  Lemma  3  to  reduce  the  problem  of  fac¬ 
toring  the  inverse  of  the  matrix  A  in  (3.2)  to  the  problem  of  factoring  the  inverse  of 
the  smaller  matrix  A  in  (3.8). 


Lemma  3.  Let  A,  Xn,  X12,  X21,  X22,  4^  an^  -4s^  as  ®n  Lemma  2.  Provided 
that  the  matrix  X22  in  (3.3)  is  non-singular,  there  exist  matrices  B  G  <Cnxk,  C  G 
Ckxn  and  D  G  Cnxn  such  that 


A"1 


B  0 
0  I 


t-i 


c  0 

0  I 


+ 


D  0 
0  0 
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(3.7) 
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where 


(3.8) 

and 

(3.9) 


A 


i(11)  4? 

A™  X(22) 


^  £(^m)x(k+m) 


i(11)  =  xu  -  XuX^Xai  e  Cfcxfc. 


Proof:  We  let  L\,  L2,  R\  and  R2  be  defined  by  (3.5).  Inverting  both  sides  of 
equation  (3.3),  we  obtain  the  identity 

/o  in\  a-  1  [  Ri  R2  0  1  y11 

(3.10)  A  -  n  n  r  A  21 

L  U  U  1  J  ,(21) 

.  ^cs 

Since  X22  is  non-singular, 

(3.11) 


*12 

1 

e*  CO 

-1 

‘  Li 

0  ' 

*22 

0 

L2 

0 

0 

4(22) 

0 

7 

Xu 

X21 

,(21) 

^CS 


Xx2 

X22 

0 


,02) 

^RS 

0 

A(22) 


-,-1 


-X2-21X21 


Xn-XviX£X2i  4s2) 


,(21) 

^cs 


A(22) 


Ik 

0 


-x12x£ 

0 


Ik 

K21 

0  /„ 

Now  we  obtain  (3.7)  by  combining  (3.10)  and  (3.11)  and  setting 


'  0 

0 

0  ' 

+ 

0 

X221 

0 

- 

0 

0 

0 

B  =  R\  —  R2X00  X21  £  C 


-mxfc 


(3.12) 


C  —  L\  —  X12X02  L2  €  C' 


-«/cxr 


D  =  R2XZ2L2  £  C 


□ 


Remark  7  (Symmetric  factorizations).  It  is  possible  to  force  the  factorization  (3.7) 
to  be  symmetric  in  the  sense  that  R  —  L*  (which  does  not  imply  that  C  =  B*  unless 
A  itself  is  Hermitian).  To  this  end,  we  define  L  and  Jr,  as  the  matrix  and  index 
vector  that  compress  the  rows  of  the  matrix  [A^  g  Knx2m  (rather  than 

the  rows  of  A^  alone),  and  set  R  —  L*  and  Jc  =  Jr.  This  modification  typically 
results  in  a  poorer  compression  ratio  but  may  dramatically  improve  the  conditioning 
of  the  transformation  matrices,  as  discussed  in  Section  4.4. 


Observation  8  (One- level  compression  of  a  block  matrix).  Consider  a  matrix 

4(11)  . . .  yi(ip) 


(3.13) 


A  = 


,4 (pi)  . . .  j[(pp) 


where  A^  £  C"xn  for  i,  j  —  1, . . .  ,p.  We  assume  that  any  neutered  row  or  column 
of  blocks  has  rank  at  most  k.  Lemma  3  can  be  used  to  reduce  the  problem  of 
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FIGURE  1.  A  3  x  3  matrix  [AW)]fJ=1  is  compressed  in  three  steps, 
cf.  Observation  8.  In  step  j  =  1,2.3,  the  single-block  compression  of 
Lemma  3  is  applied  to  compress  the  interaction  between  and 
the  rest  of  the  matrix.  Black  blocks  represents  entries  that  have 
not  been  changed  beyond  row  and  column  permutations  and  grey 
represents  entries  that  have  been  updated  but  are  not  (necessarily) 
zero. 


inverting  A  to  the  problem  of  inverting  the  smaller  matrix 


(3.14)  A  = 

’  i( li)  . . .  i(ip)  ‘ 

where  A^  £  Ckxk  for  i,j  =  1, 

. . .  Afw) 

. . ,  p,  and  A^rt  is  a 

More  specifically,  applying  Lemma  3  to  each  of  the  p  diagonal  blocks  of  A,  we 
obtain  the  factorization 


(3.15)  A_l  = 

’  Bi  0 

0  b2 

•  0  ‘ 
0 

A~l 

'  Ci  o  • 

o  c2 

o  o  •  • 

+ 

'  Di  0  •• 

0  d2 

0  ' 

0 

0  0  • 

•  Bp  . 

.00- 

•  Cp, 

0  0  •• 

•  Dp. 

where  B{  £  Cnxfe,  Q  £  Ckxn  and  Dt  €  CnXn,  for  i  =  1, . . .  ,p. 

The  single-level  matrix  compression  is  illustrated  graphically  in  Fig.  1. 


Observation  9  (Hierarchical  compression  of  a  block  matrix).  Observation  8  re¬ 
duces  the  problem  of  inversion  of  a  block  matrix  with  rank-deficient  neutered  rows 
and  columns  to  the  problem  of  inversion  of  a  block  matrix  with  smaller  blocks.  If 
by  clustering  these  smaller  blocks,  we  can  create  a  matrix  with  off-diagonal  rank- 
deficiencies,  then  the  process  can  be  repeated  recursively  to  further  improve  the 
compression. 

More  specifically,  let  us  change  notation  so  that  the  objects  labelled  A,  A  and  k 
in  Observation  8  are  now  labelled  A^,  A^  and  fcj,  respectively.  Equation  (3.15) 
then  reads 

(3.16)  (A^)'1  =  £W(A{1))_1C7(1)  +  £>« 

where  B M,  C^l\  are  block  diagonal  matrices  whose  p  diagonal  blocks  are  of 
sizes  n  x  ki,  k\  x  n,  n  x  n,  respectively.  We  then  cluster  the  blocks  of  the  matrix 


!  Compress 


/"  J.  Compress  /*  i  Compress 
Cluster  Cluster 


FIGURE  2.  An  8  x  8  block  matrix  is  compressed  through  a  three- 
level  compression  scheme  in  the  vein  of  Observation  9.  The  grey 
scale  coding  is  the  same  as  in  Fig.  1. 


A1'1'1  to  form  a  matrix  A^  with  (p/2)  x  (p/2)  blocks  of  size  2k\  x  2k\  and  apply  the 
factorization  (3.16)  to  it,  thus  obtaining  a  telescoped  factorization 


(3.17) 


(A(1))  =  B(1)  £(2)(A(2))  C(2)  +  D(2)  <7(1)  4-  £>{1). 


Here,  A$\  B®\  C^2\  D^>  are  all  block  matrices  with  (p/2)  x  (p/2)  blocks.  Letting 
&2  denote  the  rank  of  the  neutered  rows  and  columns  of  A®,  the  blocks  of  A/2)  have 
size  k%  x  &2)  while  C&\  are  diagonal  block  matrices  with  diagonal  blocks 
of  sizes  2ki  x  &2,  k^  x  2k\  and  2fci  x  2k\,  respectively.  This  process  can  be  continued 
until  no  further  clustering  is  advantageous. 

The  multi-level  matrix  compression  is  illustrated  graphically  in  Fig.  2. 


Remark  10  (Adjoint  of  the  inverse).  Obviously,  the  factorizations  (3.15)  and  (3.17) 
provide  a  mechanism  for  the  accelerated  application  of  both  A-1  and  [A-1]*. 

Remark  11  (Block  sizes).  In  Observations  8  and  9,  it  was  assumed  that  all  blocks 
within  one  of  the  matrices  A,  A,  A^,  A^2\. . . ,  have  the  same  size.  This  assumption 
was  made  for  notational  convenience  only  and  is  in  no  way  essential  to  the  results. 

4.  A  GENERAL  ALGORITHM  FOR  THE  COMPUTING  A  SPARSE  INVERSE 

In  Section  3  we  demonstrate  the  existence  of  a  compact  factorization  of  the  inverse 
of  any  block  matrix  whose  neutered  rows  and  columns  of  blocks  are  rank-deficient. 
In  this  section,  we  describe  a  numerical  scheme  for  the  construction  of  such  factor¬ 
izations,  and  estimate  its  efficiency. 


Remark  12.  The  inversion  scheme  presented  in  this  section  is  fairly  generic,  de¬ 
pending  only  on  the  ranks  of  off-diagonal  blocks  of  the  matrix  to  be  inverted.  In 
situations  where  the  structure  of  the  matrix  is  known,  further  improvements  are 
possible.  For  instance,  when  applied  to  a  dense  n  x  n  matrix  resulting  from  the 
discretization  of  a  contour  integral  operator,  the  generic  algorithm  of  this  section 
requires  0(n2)  arithmetic  operations  to  construct  its  inverse,  while  the  customized 
technique  presented  in  Section  5  requires  0{n  log2  n)  operations  or  less,  depending 
on  the  integral  operator. 

4.1.  Single  block  compression.  Lemmas  2  and  3  assert  that  the  inverse  of  a  2  x  2 
block  matrix  of  the  form  (3.2)  can  be  factored  in  the  compressed  form  (3.7).  The 
quantities  A^n\  R,  L,  and  that  appear  in  (3.7)  can  be  determined  by 
taking  the  following  steps: 

(1)  Determine  a  matrix  L  £  Cnxn  and  a  permutation  Jr  £  j£  such  that 

LA^  -  f  4ts}  1 
0  J 

where  Ap1^  is  formed  by  the  k  rows  of  AW  specified  by  Jr,  as  described 
in  Observation  6. 

(2)  Determine  a  matrix  R  £  Cnxn  and  a  permutation  Jq  £  such  that 

AWr  =  [Ags1}  o] , 

where  Ags^  is  formed  by  the  columns  of  AW  specified  by  Jc,  as  described 
in  Observation  5. 

(3)  Partition  R  and  L  as  specified  in  (3.5)  and  form  the  blocks  Xl}  as  in  (3.6). 

(4)  Compute  Ai^11),  B,  C  and  D  using  the  formulas  (3.9)  and  (3.12). 

Steps  (1)  and  (2)  require  0(mnk)  floating  point  operations  while  steps  (3)  and  (4) 
require  0(n3)  operations.  The  total  cost  is  thus  0(mnk  +  n3). 

4.2.  Single-level  compression.  Let  A  denote  a  matrix  consisting  of  p  x  p  blocks, 
each  of  size  n  x  n,  in  which  every  neutered  row  or  column  has  rank  k  such  that  k  <  n. 
Observation  8  states  that  such  a  matrix  can  be  factored  in  the  sparse  form  (3.15). 
This  factorization  contains  the  entities  Bi,  Ci,  Di,A W)  for  i,j  —  l,...,p,  which 
can  be  computed  through  p  applications  of  the  single-block  compression  technique 
of  Section  4.1  —  one  application  for  each  diagonal  block.  Each  one  of  the  p  steps 
requires  0(pkn 2  +  n 3)  floating  point  operations  resulting  in  a  total  computational 
cost  of  0(p2kn2  +  pn3). 

Remark  13.  The  off-diagonal  blocks  of  the  compressed  matrix  A  are  never  explicitly 
computed.  Instead,  the  block  AAy)  £  Ckxk  is  specified  by  giving  the  index  vectors 
Jr^,  Jc]  £  Jn  that  define  the  rows  and  columns  of  A^  £  Cfcxfc,  whose  intersections 
form  A^\  (Here  is  the  index  vector  obtained  when  compressing  the  i-th  row 
of  blocks  and  is  the  index  vector  obtained  when  compressing  the  j-th  column 
of  blocks.) 


li 


4.3.  Multi-level  compression.  The  single-level  technique  compresses  a  block  ma¬ 
trix  A  to  form  another  block  matrix  A  with  smaller  blocks.  Now,  if  by  clustering 
blocks,  we  can  create  rank-deficiencies  in  the  neutered  rows  and  columns  of  A,  then 
the  single-level  technique  can  be  applied  recursively.  The  algorithmic  implementa¬ 
tion  entirely  follows  the  description  in  Observation  9. 

When  estimating  the  computational  cost  for  the  multi-level  technique  we  use 
r  =  1 , . . . ,  R  as  an  index  for  the  levels  (with  r  —  1  being  the  finest  level) ,  we  let  pr 
denote  the  number  of  blocks  on  level  r,  nr  the  average  block  size  and  kr  the  average 
rank.  The  cost  for  step  r  is  then 

(4.1)  tT  ~  krp2n2  +prn^. 

We  assume  that  prkr  >  nT  so  that  the  second  term  is  dominated  by  the  first.  Using 
that  prkT  —  pr+irar+i,  we  then  find  that  the  total  cost  for  all  R  steps  is 

R  R 

(4.2)  T  ~  tr  ~  ^2  Pr+lPr^r+lR-r  ■ 

r=l  r= 1 

At  each  level,  the  number  of  blocks  is  cut  in  half,  so 

(4-3)  Pr  =  ~v 

We  let  7r  =  nr+i/2nr  denote  the  compression  ratio  so  that 

(4.4)  nr  =  (27r_i)  •  •  •  (271  )nv 

Assuming  that  there  exists  a  constant  7  such  that  7r  <  7,  we  obtain  the  bound 

(4.5)  nT  <  {2ry)r~lni. 

Combining  (4.2),  (4.3)  and  (4.5),  we  find  that  the  total  cost  is 

(4.6)  T  ~  ]T  (27 Y  ni  (2j)2r~2  n\  ~  p\n\  (rfY  ■ 

r=l  r— 1 

We  assume  that  7  <  \/4  —  0.7937-  •  •  so  that  the  sum  is  bounded  by  (1  —  27s)-1. 
Letting  N  denote  the  size  of  the  matrix  we  find  that  N  —  pin\  and  thus 

(4.7)  T  ~  N2n\. 

The  assumption  that  (4.5)  holds  for  some  7  <  0.7939  ■  •  •  is  valid  in  many  environ¬ 
ments  relating  to  discretization  of  contour  integral  equations.  We  will  return  to  this 
point  in  Section  6. 

4.4.  Conditioning.  All  factorizations  computed  in  this  section  are  variations  of 
(3.15).  For  this  formula  to  be  of  practical  use,  the  matrices  Bi,  Ci  and  Di  must  not 
be  excessively  large  (in  say  the  l2  operator  norm)  and  the  condition  number  of  A  has 
to  be  similar  to  that  of  A.  The  formulas  (3.12)  imply  that  this  is  true  if  ||-X’^21||2  is 
of  moderate  size  (since  (2.19)  and  (2.21)  assert  that  R  and  L  are  well-conditioned). 
Under  the  assumptions  of  this  section  (that  the  global  matrix  be  non-singular  and 
the  off-diagonal  blocks  have  low  rank)  it  is  not  possible  to  prove  any  such  bound. 
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However,  in  the  context  of  contour  integral  equations,  the  problem  can  largely  be 
avoided  by  enforcing  that  the  compression  be  symmetric  in  the  sense  of  Remark  7. 
The  reason  is  that  the  diagonal  blocks  of  the  original  matrix  tend  to  have  the  form 

(4.8)  A*11)  =  D  +  E, 

where  D  is  a  positive  definite  Hermitian  matrix  and  E  is  “small”  compared  to  D  in 
operator  norm.  Since  R2  =  L\  when  symmetry  is  enforced,  we  find  that,  cf.  (3.6), 

(4.9)  *22  =  L2(D  +  E)Ll  =  ( L2D V2)  (. L2Dl>2y  +  L2EL\. 

Here,  the  first  term  is  well-conditioned,  and  the  second  has  at  most  a  few  non-small 
singular  values.  Thus,  it  is  very  unlikely  that  the  sum  of  the  two  matrices  should 
have  any  small  singular  values.  Furthermore,  should  such  a  coincidence  happen,  the 
algorithm  detects  it  and  avoids  the  problem  by  locally  re-partitioning  the  matrix. 

4.5.  Error  estimation.  Given  a  prescribed  accuracy  e,  the  numerical  scheme  pre¬ 
sented  in  this  section  solves  the  equation 

(4.10)  Au  =  f 

by  constructing  an  approximation  Ae  that  satisfies 

(4.11)  \\A-Ae\\2<£ 

and  is  such  that  the  approximate  solution  ue  =  A~l  f  can  be  computed  fast.  The 
error  in  u  satisfies 

(4.12)  u-ue=  (. A-1  -  A71)  /  =  A J1  (Ae  -  A)  A~lf  =  A 71  (Ae  -  A)  u. 

The  relative  error  is  therefore  bounded  as  follows: 

(4.13)  <  || A-1  (A,  -  A)  ||a  <  spr'lk- 

While  the  algorithm  cannot  possibly  control  1 1^4^'1|  I2,  this  quantatity  can  be  com¬ 
puted  cheaply  using  power  iteration,  see  Remark  10.  Thus,  an  assured  bound  for 
the  relative  error  can  be  computed  a  posteriori. 

5.  An  accelerated  algorithm  applicable  to  contour  integral 

EQUATIONS 

The  bulk  of  the  computational  cost  of  the  matrix  compression  technique  presented 
in  Section  4  consists  of  the  cost  of  determining  index  vectors  and  transformation 
matrices  that  compress  the  neutered  rows  and  columns.  When  the  matrix  is  a 
discrete  approximation  of  a  contour  integral  operator,  it  is  possible  to  determine 
these  quantities  through  an  entirely  local  operation  whose  cost  only  depends  on 
the  size  of  the  diagonal  block  to  be  compressed  (he.,  not  on  the  size  of  the  rest  of 
the  matrix).  This  is  possible  since  the  column  and  row  operations  employed  in  the 
present  matrix  compression  technique  do  not  update  the  elements  of  the  off-diagonal 
blocks,  as  discussed  in  Remark  13. 

This  section  is  structured  as  follows:  In  Section  5.1  we  describe  a  single-block 
compression  technique  for  the  boundary  integral  equations  associated  with  Laplace’s 
equation  in  two  dimensions  that  is  faster  than  the  generic  single-block  technique  of 
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Figure  3.  The  contour  T.  In  Fig.  (a),  the  partitioning  T  =  Ti  +  T2 
is  shown  with  T i  drawn  with  a  bold  line.  In  Fig.  (b)  the  contour  f  2 
is  drawn  with  a  thin  solid  line  and  rext  with  a  dashed  line. 


Section  4.1.  In  Section  5.2  we  describe  single  and  multi-level  techniques  for  contour 
integral  equations  obtained  by  repeated  application  of  the  single-block  compression 
technique  of  Section  5.1.  Section  5.3  discusses  generalizations  of  the  technique  to 
other  equations  of  potential  theory. 

Remark  14  (Rank-deficient  matrices).  In  this  section,  we  say  that  a  matrix  has 
rank  k  provided  that  it  has  only  k  singular  values  that  are  larger  than  some  preset 
accuracy.  In  other  words,  we  do  not  distinguish  between  what  is  sometimes  called 
“numerical  rank”  and  actual  rank. 

5.1.  Single-block  compression.  The  following  observation  summarizes  the  prin¬ 
ciple  finding  of  this  section: 

Observation  15.  Let  the  matrix  A  in  (3.2)  represent  the  discretization  of  the 
integral  operator 

(5.1)  JrK(x’ ds(y)i  for  x  ^  r, 

where  r  =  ri  +  r2isa  contour  (Fig.  3  shows  one  example),  the  block  structure  of  A 
corresponds  to  the  partitioning  of  T  (so  that,  e.g.,  A(-12'>  represents  evaluation  on  Fi 
of  the  potential  generated  by  a  charge  distribution  on  T2),  and  K  is  the  kernel  of  a 
single  and/or  double  layer  potential  for  the  Laplace  operator.  Then  the  factorization 
(3.3)  can  be  computed  using  0(n3)  floating  point  operations,  where  n  is  the  number 
of  points  used  in  the  discretization  of  Ti. 

The  idea  behind  the  construction  alluded  to  in  Observation  15  is  simple:  Instead 
of  compressing  the  interaction  between  Ti  and  T2,  it  is  sufficient  to  compress  the 
interaction  between  Fi  and  a  small  contour  formed  by  the  union  of  an  artificial 
circular  contour  enclosing  Ti  and  the  part  of  T2  that  is  inside  this  circle  (as  shown 
in  Fig.  3(b)).  The  reason  is  that  by  virtue  of  Green’s  theorem,  any  potential  field 
generated  by  charges  on  T2  can  equally  well  be  generated  by  charges  on  fv  Finally 
we  note  that  if  Ti  is  discretized  using  n  nodes,  then  can  be  discretized  using  0(n) 
nodes,  yielding  a  total  cost  for  the  procedure  of  0(n3). 
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The  remainder  of  this  subsection  is  devoted  to  substantiating  Observation  15. 
We  start  by  introducing  some  notation;  let  rcirc  denote  the  circle  in  Fig.  3(b)  and 
let  Text  denote  the  part  of  F2  outside  of  rcirc-  Furthermore,  let  Sr2^r,.  denote  the 
integral  operator  that  evaluates  a  potential  on  Fi  caused  by  a  charge  distribution 
on  r2.  In  other  words,  Sr2_ri  acts  on  a  charge  distribution  u  as  follows: 

(5.2)  [Sr2-*rj «](*)=  /  K(x,y)u(y)ds(y),  for  x  e  Ti. 

J  r2 

Observation  15  rests  on  the  following  claim: 

Lemma  4.  Let  H  €  C"xn  denote  the  matrix  discretizing  Srcirc_ri>  an<^  ^  ^ie 
index  vector  Jr  €  and  the  transformation  matrix  L  be  such  that  they  compress  H 
in  the  sense  of  Observation  6.  Then  Jr  and  L  also  compress  the  matrix  B  £  <Cnxm 
that  approximates  the  operator  Srext_r1  • 


Sketch  of  proof:  It  is  sufficient  to  prove  that  there  exists  a  matrix  W  £  Cn'xm 
with  moderate  l2  operator  norm  such  that 

(5.3)  B  =  HW. 


(The  matrix  W  is  the  matrix  that  maps  a  charge  distribution  on  rext  to  an  equivalent 
charge  distribution  on  rcjrc.)  Now,  equation  (5.3)  is  the  discrete  approximation  of 
the  operator  relation 


(5.4) 


Srext-+ri  Sfcirc-^r!  [(Srci, 


-rcirc)  'Sr. 


circ 


The  matrix  W  in  (5.3)  corresponds  to  the  operator  in  square  brackets  in  (5.4).  That 
this  operator  is  bounded  is  a  consequence  of  Green’s  theorem.  □ 


5.2.  Single-  and  multi-level  compression.  The  generic  single-  and  multi-level 
compression  techniques  of  Sections  4.2  and  4.3  were  obtained  by  repeated  applica¬ 
tion  of  the  single-block  technique  described  in  Section  4.1.  Single-  and  multi-level 
techniques  for  contour  integral  equations  are  analogously  obtained  by  repeated  ap¬ 
plication  of  the  single-block  technique  of  Section  5.1. 

The  remainder  of  this  subsection  is  devoted  to  estimating  the  computational  cost 
of  the  accelerated  compression  technique.  The  cost  for  a  single  level  compression  at 
level  r  =  1, . . . ,  R  is  now,  cf.  (4.1), 

(5.5)  tr  ~  prnzT, 

where  pT  denotes  the  number  of  clusters  on  level  r  and  nr  is  the  (average)  cluster 
size.  Under  the  assumptions  (4.3)  and  (4.5),  we  find  that 

(5-6)  tr  ~  ^T(27)3-3nf. 

The  total  cost  for  all  R  steps  is  then 

Ft  R 

(5.7)  T~^prn3  <pinf^(473)r_1. 

r=l  r= 1 
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We  assume  that  7  <  4  1/3  =  0.630-  •  •  so  that  the  sum  is  bounded  by  (1  —  4y3)  1. 
Letting  N  denote  the  size  of  the  original  matrix,  we  find  that  N  =  n\pi  and  thus 

(5.8)  T  ~  Nn\. 

When  the  kernel  of  the  equation  is  associated  with  the  fundamental  solution  of 
Laplace’s  equation,  it  is  possible  to  prove  that  the  assumption  (4.5)  holds  with 
7  w  1/2  when  n\  >  log  N,  which  gives  an  upper  bound  on  the  computational  cost 
of  0(N  log2  N).  However,  further  acceleration  is  achieved  by  choosing  a  smaller  n\, 
even  though  the  cluster  size  then  grows  slightly  in  the  first  couple  of  compressions. 
This  explains  why  the  log2  N  factor  is  not  visible  in  the  experiments  in  Section  6. 

Remark  16.  The  single-block  compression  technique  described  in  Observation  15 
requires  the  algorithm  to  determine  which  of  the  nodes  of  T2  lie  inside  the  artificial 
circle  Tcirc.  If  this  search  would  be  done  by  brute  force,  the  computational  cost  for 
a  single  level  solve  would  include  a  term  p2n2,  cf.  (5.5).  Even  though  the  constant 
in  front  of  this  term  is  small,  it  would  dominate  the  computation  for  large  problems 
(in  our  implementation,  this  would  happen  for  N  >  25000).  One  solution  to  this 
problem  is  to  perform  the  search  via  a  hierarchical  search  tree;  the  estimate  (5.5) 
then  remains  valid. 

5.3.  Generalizations.  The  technique  presented  in  Section  5.1  for  Laplace’s  equa¬ 
tion  is  readily  applicable  to  other  equations  of  classical  potential  theory;  Helmholtz, 
Yukawa,  Shrodinger,  Maxwell,  Stokes,  elasticity,  et  c.  The  only  complication  occurs 
when  working  with  equations  that  may  have  resonances.  In  such  cases,  it  is  possible 
that  the  operator  of  self-interaction  for  the  artificial  circle  (the  operator  Srcirc-.rcirc 
in  (5.4))  has  a  non-trivial  nullspace.  This  complication  can  be  rectified  by  letting 
the  artificial  charges  on  rcirc  consist  of  both  monopoles  and  dipoles.  Alternatively, 
it  is  possible  to  consider  only  one  type  of  charges  but  placing  them  on  two  concentric 
circles  instead  of  a  single  one. 

When  applied  to  oscillatory  problems  such  as  Helmholtz’  and  Maxwell’s  equations, 
the  efficiency  of  the  technique  deteriorates  when  the  wave  number  increases  since 
then  the  compression  rate  deteriorates  as  the  blocks  grow  larger  (in  other  words, 
the  assumption  (4.5)  no  longer  holds).  In  practise,  it  appears  that  the  method  ex¬ 
periences  very  few  problems  for  objects  smaller  than  about  50  wavelengths.  After 
that,  the  computational  complexity  increases  superlinearly  with  the  problem  size 
although  the  technique  remains  viable  for  equations  set  on  contours  about  a  thou¬ 
sand  wavelengths  in  size.  This  effect  will  be  illustrated  in  the  numerical  examples 
in  Section  6.1. 

Finally  we  remark  that  the  scheme  has  0(n  log*  n)  complexity  when  applied  to 
integral  equations  defined  on  one-dimensional  curves  in  any  dimension.  The  fact 
that  we  have  so  far  only  discussed  equations  embedded  in  two  space  dimensions  is 
simply  that  contour  integral  equations  associated  with  boundary  value  problems  in 
two  dimensions  is  the  most  common  source  of  such  equations. 
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6.  Numerical  examples 


In  this  section  we  present  the  results  of  a  number  of  numerical  experiments  per¬ 
formed  to  assess  the  efficiency  of  the  numerical  scheme  presented  in  Sections  4  and 
5.  In  every  experiment,  we  compute  a  sparse  factorization  of  the  inverse  of  the 
matrix  resulting  from  Nystrom  discretization  of  one  of  the  following  three  integral 
equations: 

(6.1)  ±^u(x)  +  -^  J^\n(y) -Vy  log  \x-y\]u(y)ds(y)=f(x),  xeT, 

(6.2)  J^[log\x-y\]u{y)ds(y)  =f(x),  xcT, 

(6.3)  ^2\u(x)  +  ■  Vy  +  ik)  H0(k\x  -  y\)]u(y)  ds(y)  =f(x),  x  G  T, 

where  n(y)  is  the  outward  pointing  unit  normal  of  T  at  y  and  Hq(x)  =  Jq(x)  +  Do(:e) 
is  the  Hankel  function  of  zeroth  order.  The  equations  (6.1)  and  (6.2)  are  the  double 
and  single  layer  equations  associated  with  Laplace  Dirichlet  problems,  and  (6.3)  is 
an  equation  associated  with  the  Helmholtz  Dirichlet  problem  with  wave  number  k. 

In  equations  (6.1)  and  (6.3),  the  top  sign  in  front  of  the  first  term  refers  to  exterior 
problems  and  the  lower  sign  refers  to  interior  problems. 

The  kernel  in  (6.1)  is  smooth  and  the  equation  was  discretized  using  the  trape¬ 
zoidal  rule  (which  is  exponentially  convergent  on  a  smooth  contour).  The  equations 
(6.2)  and  (6.3)  involve  log-singular  kernels  that  were  discretized  using  the  modified 
trapezoidal  quadrature  rules  of  [9]  of  orders  6  and  10,  respectively.  The  algorithm 
was  implemented  in  Matlab  and  the  experiments  were  run  on  a  2.8  GHz  Pentium  4 
desktop  with  512Mb  of  memory. 

When  presenting  the  numerical  results,  we  use  the  following  notation: 

R  the  number  of  levels  in  the  multi-level  solver, 

-Nstart  the  size  of  the  discrete  problem  at  the  start, 

iVgnai  the  size  of  the  compressed  problem, 

ftot  the  total  CPU  time  (in  seconds), 

fsolve  the  CPU  time  required  to  apply  the  factorized  inverse  (in  seconds), 

ctop  the  condition  number  of  the  compressed  matrix, 

^min  the  smallest  singular  value  of  the  original  matrix, 

M  the  amount  of  memory  used  (in  kB), 

^actual  the  relative  error  in  u.  Eact uai  =  \\u£  -  u||/||u||, 

Eies  the  relative  residual  error,  Eres  =  \\Aue  —  /||/||/||, 

In  each  experiment,  the  right  hand  side  /  was  the  Dirichlet  data  corresponding  to 
a  potential  field  generated  by  a  few  randomly  placed  point  charges.  Since  the  exact 
potential  field  was  known,  we  were  able  to  compare  the  potential  field  generated  by 
the  numerical  solution  to  the  exact  one.  We  did  this  at  J  random  points  on  a  circle 
separated  from  T  by  half  its  radius.  Letting  {v^}j=1  denote  the  exact  potential 

and  {ve^}j=i  denote  the  potential  generated  by  u£,  we  define  the  relative  error  in 
the  potential  as  Evot  =  ||u  —  u£||/]|u||. 
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FIGURE  4.  A  smooth  contour.  The  length  of  the  contour  is  roughly 
5.1  and  its  horizontal  width  is  2. 


6.1.  Example:  A  smooth  contour.  In  this  subsection  we  present  results  per¬ 
taining  to  the  smooth  contour  shown  in  Fig.  4.  The  contour  was  discretized  using 
between  800  and  102400  points  and  the  integral  equations  associated  with  exterior 
Dirichlet  problems  were  solved.  Tables  1,  2  and  3  present  the  results  for  the  kernels 
(6.1),  (6.2)  and  (6.3),  respectively. 

For  the  two  Laplace  problems  considered,  we  see  that  both  the  computational 
cost  and  the  memory  requirement  scale  more  or  less  linearly  with  the  problem  size, 
as  expected.  We  recall  that  this  expectation  was  based  on  the  postulate  that  for 
Laplace  problems,  the  interaction  rank  between  adjacent  clusters  depend  only  very 
weakly  (logarithmically)  on  their  size.  Fig.  5  illustrates  this  point;  it  shows  that 
after  two  rounds  of  compression,  almost  the  only  nodes  that  have  survived  are  the 
ones  near  the  border  to  the  neighboring  clusters.  The  figure  also  illustrates  that  the 
algorithm  detects  the  need  to  keep  more  nodes  in  the  interior  of  those  clusters  that 
run  close  to  other  clusters.  We  note  however  that  these  experiments  are  somewhat 
artificial  in  that  they  use  an  excessive  number  of  discretization  points  (for  instance, 
the  quadrature  error  associated  with  the  N  =  102400  experiment  for  the  Laplace 
double  layer  potential  corresponds  to  a  quadrature  error  of  roughly  10~loo°). 

Since  the  scheme  presented  in  this  paper  relies  on  rank-considerations  only,  it 
works  for  oscillatory  problems  with  low  wave  numbers  but  it  eventually  fails  as 
the  wavenumber  is  increased.  Table  4  illustrates  this  point  by  showing  how  the 
compression  ratios  deteriorate  as  the  wavenumber  k  in  the  kernel  (6.3)  is  increased 
from  1  to  500.  However,  the  authors  were  surprised  to  find  that  the  method  remains 
viably  up  to  objects  about  1000  wavelengths  across,  as  indicated  in  Table  3. 

Remark  17  (Comparison  with  the  Fast  Multipole  Method).  When  the  Fast  Multi¬ 
pole  Method  is  applied  to  solve  the  Laplace  Dirichlet  problem  on  the  contour  shown 
in  Fig.  4,  the  CPU  time  required  for  a  single  matrix-vector  multiply  is  roughly  30 
times  smaller  than  the  CPU  time  required  to  construct  the  inverse  of  the  matrix. 
Thus,  if  less  than  30  iterations  are  required  in  an  iterative  solver  (which  is  the  case 
for  well-conditioned  problems),  the  FMM  is  faster  for  a  single  solve.  However,  the 
cost  of  a  single  FMM  matrix- vector  multiply  is  about  3  times  larger  than  the  cost  to 
apply  the  inverse  to  a  vector,  ensuring  that  for  problems  involving  several  right-hand 
sides,  the  direct  solver  will  outperform  the  FMM. 


18 


R 

Af start 

Affinal 

^tot 

^solve 

■^actual 

Eres 

Epot 

Ctop 

^min 

M 

3 

800 

335 

1.2e+0 

1.2e-2 

4.0e-09 

3.4e-9 

5.6e-10 

1.9e+2 

1.5&-2 

4250 

4 

1600 

368 

4.2e+0 

2.4e-2 

l.le-09 

2.4e-9 

4.4e-10 

3.6e+2 

1.4e-2 

6627 

5 

3200 

369 

8.4e+0 

5.1e-2 

— 

2.7e-9 

6.4e-10 

3.6e+2 

1.4e-2 

8987 

6 

6400 

369 

9.0e+0 

6.5e-2 

— 

3.3e-9 

1.6&-10 

8.1e+2 

1.4e-2 

13301 

7 

12800 

369 

1.3e+l 

1.2e-l 

— 

2.6e-9 

6.6&-10 

1.6e+3 

1.4e-2 

21991 

8 

25600 

370 

1.6e+l 

2.4e-l 

— 

3.8e-9 

2.7e-10 

1.6e+3 

1.4e-2 

39970 

9 

51200 

371 

3.5e+l 

4.9e-l 

— 

2.9e-9 

4.9e-10 

2.0e+3 

— 

76346 

10 

102400 

375 

9.0e+l 

9.8e-l 

— 

1.3e-9 

— 

1.2e+4 

— 

151571 

Table  1.  Computational  results  for  the  double  layer  potential  (6.1) 
associated  with  an  exterior  Laplace  Dirichlet  problem  on  the  contour 
shown  in  Fig.  4. 


R 

Af start 

Affinal 

^tot 

^solve 

-^actual 

■Eres 

Epot 

Ctop 

G  min 

M 

3 

800 

251 

1.9e+00 

l.le-02 

1.8e-07 

1.86-09 

6.4e-07 

8.9e+04 

8.9e-05 

3043 

4 

1600 

265 

3.4e+00 

1.9e-02 

1.7e-07 

2.2e-09 

9.8e-10 

2.1e+04 

2.2e-05 

5085 

5 

3200 

279 

3.8e+00 

3.4e-02 

— 

1.6e-09 

2.1e-10 

2.2e+04 

8.6&-06 

8426 

6 

6400 

287 

l.le+01 

1.9e-01 

— 

2.16-09 

1.2e-10 

1.6e+05 

2.2e-07 

14483 

7 

12800 

295 

1.2e+01 

1.3e-01 

— 

1.5e-09 

3.5e-10 

1.2e+04 

4.9e-07 

26327 

8 

25600 

305 

1.4e+01 

2.5e-01 

— 

2.7e-09 

3.1e-10 

4.0e+04 

1.0e-07 

49703 

9 

51200 

317 

2.8e+01 

4.9e-01 

— 

2.2e-09 

2.3e-10 

8.0e+03 

— 

96228 

10 

102400 

322 

8.2e+01 

9.8e-01 

— 

2.06-09 

— 

1.9e+04 

— 

189065 

Table  2.  Computational  results  for  the  single  layer  potential  (6.2) 
associated  with  an  exterior  Laplace  Dirichlet  problem  on  the  contour 
shown  in  Fig.  4. 


k 

A^start 

Affinal 

ttot 

^solve 

-^actual 

Eres 

Ep0t 

Ctop 

C'min 

M 

21 

800 

435 

1.5e+01 

3.3e-02 

2.7e-07 

9.7e-08 

7.1e-07 

4.1e+03 

6.5e-01 

12758 

40 

1600 

550 

3.0e+01 

6.7e-02 

1.6e-07 

6.2e-08 

4.0e-08 

6.1e+03 

8.0e-01 

25372 

79 

3200 

683 

5.3e+01 

1.2e-01 

— 

5.3e-08 

3.8e-08 

2.1e+04 

3.4e-01 

44993 

158 

6400 

870 

9.2e+01 

2.0e-01 

— 

3.9e-08 

2.9e-08 

4.0e+04 

3.4e-01 

81679 

316 

12800 

1179 

1.8e+02 

3.9e-01 

— 

2.3e-08 

2.0e-08 

4.2e+04 

3.4e-01 

160493 

632 

25600 

1753 

4.3e+02 

7.5e-)-00 

— 

1.7e-08 

1.4e-08 

9.0e+04 

3.3e-01 

350984 

1264 

51200 

2864 

(1.5e+03) 

(2.3e+02) 

— 

9.5e-09 

— 

— 

— 

835847 

Table  3.  Computational  results  for  the  kernel  (6.3)  associated  with 
an  exterior  Helmholtz  Dirichlet  problem  on  the  contour  shown  in 
Fig.  4.  The  Helmholtz  parameter  was  chosen  to  keep  the  number  of 
discretization  points  per  wavelength  constant  at  roughly  45  points 
per  wavelength  (resulting  in  a  quadrature  error  about  10-12).  The 
times  in  parenthesis  refer  to  experiments  that  did  not  fit  in  RAM. 
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FIGURE  5.  The  points  left  after  two  rounds  of  compression  of  the 
contour  shown  in  Fig.  4.  The  crosses  mark  the  boundary  points 
between  adjacent  clusters. 


k 

71 

72 

73 

74 

75 

76 

77 

78 

•^final 

M 

1 

0.68 

0.58 

0.54 

0.55 

0.58 

0.64 

0.64 

0.72 

512 

166908 

100 

0.72 

0.56 

0.55 

0.56 

0.60 

0.68 

0.72 

0.82 

777 

195882 

500 

0.72 

0.58 

0.58 

0.62 

0.68 

0.76 

0.84 

0.91 

1522 

303452 

TABLE  4.  This  table  shows  to  which  extent  the  assumption  (4.5) 
of  constant  compression  ratios  fails  for  the  Helmholtz  problem  with 
large  wave-numbers.  It  displays  the  compression  ratios  7 j  at  each  of 
the  levels  j  —  1, . . . ,  8  for  the  Helmholtz  kernel  (6.3)  on  the  smooth 
contour  in  Fig.  4,  discretized  with  N  =  25  600  points.  The  three 
rows  correspond  to  wave  numbers  k  —  1, 100, 500.  The  second  to  last 
column  shows  the  number  of  degrees  of  freedom  left  on  the  finest 
level  and  the  last  column  shows  the  total  memory  requirement  (in 
kB). 


20 


(a)  (b) 


Figure  6.  (a)  A  rippled  contour,  (b)  A  close-up  of  the  area  marked 
by  a  dashed  rectangle  in  (a).  The  horizontal  axis  of  the  contour 
has  length  1  and  the  number  of  ripples  change  between  the  different 
experiments  to  keep  a  constant  ratio  of  80  discretization  nodes  per 
wavelength. 

6.2.  A  rippled  contour  that  almost  self-intersects.  In  this  subsection  we  present 
results  pertaining  to  the  rippled  contour  shown  in  Fig.  6.  The  contour  was  discretized 
using  between  800  and  102400  points  and  integral  equations  associated  with  exterior 
Dirichlet  problems  were  solved.  The  number  of  ripples  in  the  experiments  increase 
with  the  number  of  discretization  nodes  in  such  a  fashion  that  there  are  roughly 
80  nodes  for  each  wavelength.  Tables  5,  6  and  7  present  the  results  for  the  kernels 
(6.1),  (6.2)  and  (6.3),  respectively. 

We  see  that  the  asymptotic  complexity  of  the  algorithm  remains  essentially  the 
same  as  for  the  smooth  contour  shown  in  Fig.  4.  However,  the  constants  involved 
are  larger  since  more  degrees  of  freedom  are  required  to  resolve  the  contour  at  the 
finest  levels. 
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R _ N start  Affinal _ ^tot _ ^solve  -^actual _ Eres _ Epot  ^top _ ^min  M 


2 

400 

160 

2.4e-01 

4.6e-03 

2.3e-09 

2.0e-09 

1.2e-09 

7.1e+01 

4.0e-02 

954 

3 

800 

214 

4.7e-01 

8.9e-03 

2.3e-09 

2.5e-09 

2.8e-10 

1.3e+02 

3.1e-02 

2110 

4 

1600 

286 

7.5e+00 

2.6e-02 

1.9e-09 

2.1e-09 

9.8e-ll 

2.6e-f02 

2.2e-02 

4710 

5 

3200 

361 

l.le+01 

3.7e-02 

— 

1.4e-09 

1.8e-10 

5.3e+02 

1.8e-02 

9781 

6 

6400 

437 

1.5e+01 

7.2e-02 

— 

2.0e-09 

1.3e-10 

l.le+03 

1.5e-02 

20484 

7 

12800 

508 

2.1e+01 

1.5e-01 

— 

1.6e-09 

9.2e-ll 

2.3e+03 

1.4e-02 

42307 

8 

25600 

559 

3.7e+01 

2.9e-01 

— 

2.0e-09 

1.3e-10 

5.4e+03 

1.3e-02 

86481 

9 

51200 

599 

8.0e+01 

6.1e-01 

— 

1.8e-09 

2.8e-10 

1.5e+04 

— 

177442 

10 

102400 

634 

1.9e+02 

1.2e+00 

— 

1.4e-09 

— 

2.2e-f04 

— 

365495 

Table  5. 

Computational  results  for  the  double  layer  potential  (6.1) 

associated  with  an  exterior  Laplace  Dirichlet  problem 

on  the  rippled 

contour  shown  in  Fig.  6. 

R 

N start 

N final 

ttot 

^solve 

•^actual 

etos 

Epot 

Ctop 

^min 

M 

2 

400 

152 

1.8e-01 

4.5e-03 

2.0e-07 

3.6e-09 

1.4e-06 

3.4e+03 

5.5e-04 

953 

3 

800 

188 

4.2e-01 

8.7e-03 

4.3e-06 

2.9e-09 

5.0e-07 

7.9e+04 

1.0e-05 

2050 

4 

1600 

216 

4.1e+00 

2.3e-02 

6.9e-07 

2.3e-09 

1.2e-08 

5.1e+03 

1.6e-05 

4086 

5 

3200 

240 

5.7e+00 

3.4e-02 

— 

2.8e-09 

1.0e-08 

3.5e+05 

1.2e-05 

7943 

6 

6400 

268 

l.le+01 

6.6e-02 

— 

2.2e-09 

2.3e-09 

7.3e+04 

2.1e-06 

15592 

7 

12800 

284 

l.le+01 

1.3e-01 

— 

3.3e-09 

2.0e-09 

1.9e+04 

1.7e-07 

30607 

8 

25600 

300 

1.6e+01 

2.6e-01 

— 

2.4e-09 

8.1e-10 

9.5e+05 

9.7e-08 

60443 

9 

51200 

310 

3.3e+01 

5.2e-01 

— 

2.8e-09 

3.1e-10 

2.0e+05 

— 

120089 

10 

102400 

323 

9.0e+01 

1.0e+00 

— 

1.8e-09 

— 

7.9e+04 

— 

239012 

Table  6. 

Computational  results  for  the  single  layer  potential  (6.2) 

associated  with  an  exterior  Laplace  Dirichlet  problem 

on  the  rippled 

contour  shown  in  Fig.  6. 

k 

N start 

■^Vfinal 

ttot 

^solve  -^actual  E^es  Epot  Ctop  & . i 

nin 

7 

400 

224 

2.9e+00 

9.0e-03 

1.4e-07 

6.9e-08 

9.4e-07 

1.2e+04 

7.9e-01 

3241 

15 

800 

320 

7.7e+00 

1.9e-02 

1.6e-07 

7.4e-08 

1.2e-07 

3.9e+03 

7.9e-01 

8233 

29 

1600 

470 

2.1e+01 

4.6e-02 

— 

6.7e-08 

8.1e-08 

7.4e+03 

7.8e-01 

20469 

58 

3200 

704 

6.1e+01 

l.le-01 

— 

5.2e-08 

6.4e-08 

1.2e+04 

8.0e-01 

49854 

115 

6400 

1122 

1.4e+02 

2.9e-01 

— 

4.8e-08 

7.5e-08 

1.4e+04 

8.0e-01 

126576 

230 

12800 

1900 

(4.7e+02) 

(2.5e+01) 

— 

5.5e-08 

7.5e-08 

8.8e+04 

8.0e-01 

341054 

461 

25600 

3398 

— 

— 

— 

— 

— 

— 

— 

983061 

TABLE  7.  Computational  results  for  the  kernel  (6.3)  associated  with 
an  exterior  Helmholtz  Dirichlet  problem  on  the  rippled  contour  shown 
in  Fig.  6.  The  Helmholtz  parameter  k  was  chosen  to  keep  the  number 
of  discretization  points  per  wavelength  constant  at  roughly  55  points 
per  wavelength  (resulting  in  a  quadrature  error  about  10~12).  The 
times  in  parenthesis  refer  to  experiments  that  did  not  fit  in  RAM. 
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Figure  7.  A  contour  the  shape  of  a  smooth  pentagram.  Its  diameter 
is  2.5  and  its  length  is  roughly  8.3. 

6.3.  An  interior  problem  close  to  a  resonance.  In  this  section  we  present 
results  pertaining  to  interior  Dirichlet  problem  on  the  contour  shown  in  Fig.  7. 
While  interior  and  exterior  Laplace  Dirichlet  problems  are  quite  similar  in  nature,  the 
corresponding  Helmholtz  Dirichlet  problems  are  fundamentally  different  in  that  the 
interior  problem  possesses  resonances  while  the  exterior  does  not.  We  will  therefore 
focus  exclusively  on  interior  Helmholtz  problems 

We  present  the  results  of  two  computational  experiments,  both  relating  to  the 
Helmholtz  kernel  (6.3).  In  the  first  experiment,  we  scan  a  range  of  wave  numbers 
k  between  99.9  and  100.1.  For  each  wave  number,  we  computed  the  smallest  sin¬ 
gular  value  <Tmjn  of  the  integral  operator  using  the  iteration  technique  described  in 
Section  4.5.  The  resulting  graph  of  crmin  versus  k,  shown  in  Fig.  8,  clearly  indicates 
the  location  of  each  resonance  in  this  interval.  The  second  experiment  consists  of 
factoring  the  inverse  of  the  matrix  corresponding  to  k  —  100.0110276  ■  •  ■  for  which 
Cmin  —  0.00001366  •  •  ■ .  The  results,  shown  in  Table  8,  illustrate  that  the  method 
does  not  experience  any  difficulty  in  factoring  the  inverse  of  an  ill-conditioned  ma¬ 
trix.  In  particular,  the  table  shows  that  the  factorization  matrices  B^\  C^)  and 
Z)L)  ,  see  (3.17),  are  well-conditioned. 
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99.9  99.92  99.94  99.96  99.98  100  100.02  100.04  100.06  100.08  100.1 


Figure  8.  Plot  of  <rmin  versus  k  for  an  interior  Helmholtz  problem 
on  the  contour  shown  in  Fig.  7.  The  values  shown  were  computed 
using  the  iteration  technique  of  Section  4.5  applied  to  a  matrix  of 
size  N  —  6400.  Each  point  in  the  graph  required  about  60  seconds 
of  CPU  time. 


3  Pi  ni  7j  tj 

1  128  50.00  0.76  15.50 

2  64  76.00  0.59  14.32 

3  32  89.72  0.60  8.94 

4  16  107.00  0.64  6.27 

5  8  138.00  0.72  5.97 

6  4  199.50  0.80  7.76 


[IC^Iloc  !lfl(j)Hoo  Halloo 

1.12e+00  1.12e+00  4.20e-02 

3.27e+01  3.27e+01  1.75e+00 
1.63e+01  1.62e+01  9.28e-01 

9.09e+00  9.17e+00  2.41e+00 
7.32e+00  7.31e+00  3.64e+00 
3.22e-t-00  3.23e+00  3.86e-f-00 


Table  8.  Details  of  the  computation  for  the  Helmholtz  kernel 
(6.3)  associated  with  an  interior  Dirichlet  problem  on  the  smooth 
pentagram  shown  in  Fig.  7  for  the  case  N  —  6400  and  k  — 
100.011027569  ••• .  For  each  level  j,  the  table  shows  the  number 
of  clusters  pj  on  that  level,  the  average  size  of  a  cluster  rij ,  the  com¬ 
pression  ratio  7j,  the  time  required  for  the  factorization  tj  and  the 
size  of  the  matrices  B^\  and  (see  (3.17))  in  the  maximum 
norm.  For  this  computation,  ETes  =  2.8  •  10~10,  Epot  —  3.3  •  10-5  and 
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Figure  9.  The  star-fish  lattice  contour;  the  physical  distance  be¬ 
tween  two  random  points  on  the  contour  is  not  well  predicted  by 
their  distance  along  the  contour. 

6.4.  A  contour  resembling  an  area  integral.  The  final  numerical  experiment 
that  we  present  is  included  to  demonstrate  that  the  efficiency  of  the  factorization 
scheme  deteriorates  when  it  is  applied  to  a  curve  for  which  the  physical  distance 
between  two  random  points  on  the  contour  is  not  well  predicted  by  their  physical 
separation.  One  example  of  such  a  curve  is  the  star-fish  lattice  illustrated  in  Fig.  9. 
Focussing  on  the  double  layer  Laplace  problem  (6.1),  we  apply  the  factorization 
scheme  to  a  matrix  of  size  N  =  25600  and  compare  the  performance  to  that  for 
the  rippled  dumb-bell  shown  in  Fig.  6.  Table  9  shows  that  the  factorization  of  the 
matrix  related  to  the  starfish  lattice  took  almost  five  times  as  long  and  resulted  in 
a  compressed  matrix  of  over  twice  the  size. 

To  understand  the  difference  in  performance  between  the  different  contours,  we 
need  to  consider  how  the  interaction  rank  of  a  cluster  depends  on  its  size.  For  the 
contours  shown  in  Figures  4,  6  and  7,  we  have  seen  that  the  rank  of  the  interaction 
between  a  cluster  of  size  m  and  the  rest  of  the  contour  is  effectively  bounded  by 
logm.  However,  for  the  contour  shown  in  Fig.  9  the  corresponding  bound  is  \fm. 
Figures  5  and  10  illustrate  the  difference.  Thus,  the  asymptotic  complexity  of  the 
scheme  when  applied  to  a  contour  similar  to  the  star-fish  lattice  is  0(n3/2)  rather 
than  0(n  logn). 
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Contour: 

^tot 

N ■ start 

M 

Rippled  dumb-bell 

37s 

25600 

559 

86Mb 

Star-fish  lattice 

172s 

25600 

1202 

210Mb 

TABLE  9.  Test  results  for  two  experiments  concerning  the  matrix 
obtained  by  discretizing  the  double  layer  Laplace  problem  (6.1).  One 
involved  the  rippled  dumb-bell  shown  in  Fig.  6  and  the  other  the  star¬ 
fish  lattice  shown  in  Fig.  9. 


W  r^l  H 


3.5  4  4.5  5  55  6 

(a) 


Figure  10.  Fig.  (a)  shows  a  close-up  of  the  star-fish  lattice  of  Fig.  9. 
Fig.  (b)  shows  the  nodes  remaining  after  the  interaction  between 
the  cluster  formed  by  the  points  inside  the  parallelogram  and  the 
remainder  of  the  contour  has  been  compressed,  cf.  Fig.  5. 
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7.  Generalizations  and  conclusions 


We  have  presented  a  numerical  scheme  that  constructs  a  data-sparse  factorization 
of  the  inverse  of  a  matrix.  The  scheme  is  applicable  to  generic  matrices  whose  off- 
diagonal  blocks  have  rank-deficiencies  but  is  most  efficient  when  applied  to  matrices 
arising  from  the  discretization  of  integral  equations  defined  on  one-dimensional  con¬ 
tours  (although  such  integral  equations  frequently  arise  in  the  analysis  of  boundary 
value  problems  in  two  dimensions,  the  dimension  of  the  underlying  space  is  of  no 
relevance  to  the  algorithm).  For  equations  with  non-oscillatory  kernels  the  compu¬ 
tational  complexity  of  the  algorithm  is  0(n  logK  re)  for  most  contours,  where  k  —  1 
or  2,  and  n  is  the  number  of  nodes  in  the  discretization  of  the  contour. 

Comparing  our  implementations  of  the  direct  factorization  scheme  on  the  one 
hand  and  the  FMM  matrix- vector  multiplication  scheme  on  the  other,  we  observed 
(i)  that  in  a  typical  environment,  the  cost  of  constructing  a  factorization  of  the 
inverse  is  30  times  larger  than  the  cost  of  a  single  FMM  matrix- vector  multiply, 
and  (ii)  that  once  the  factorization  of  the  inverse  has  been  computed,  the  cost  to 
apply  it  to  a  vector  is  3  times  smaller  than  the  cost  of  a  single  FMM  matrix- vector 
multiply.  Thus,  if  an  iterative  solver  requires  less  than  30  steps  to  converge,  the 
iterative  solver  outperforms  the  direct  solver  for  a  single  solve.  However,  if  multiple 
right-hand  sides  are  involved,  the  direct  solver  has  a  clear  advantage. 

Since  the  scheme  is  based  on  rank  considerations  only,  it  cannot  work  for  bounday 
integral  equations  involving  highly  oscillatory  kernels.  However,  since  the  interaction 
ranks  are  determined  dynamically,  the  oscillation  must  be  quite  significant  before  the 
scheme  becomes  impracticable.  Empirically,  it  was  found  that  the  scheme  remains 
efficient  for  contours  about  1000  wavelengths  in  size. 

Another  limitation  of  the  scheme  is  that  it  does  not  achieve  optimal  efficiency 
when  applied  to  a  boundary  integral  equation  set  on  either  a  contour  similar  to  the 
one  shown  in  Fig.  9,  or  on  a  two-dimensional  surface.  In  either  case,  its  computa¬ 
tional  complexity  is  0(re3/2).  Overcoming  this  limitation  is  a  subject  of  on-going 
research. 

Finally,  we  mention  that  the  matrix  factorization  scheme  presented  in  this  paper 
can  be  modified  to  construct  certain  standard  matrix  factorizations  (such  as  the 
singular  value  decomposition) .  This  modification  will  be  reported  at  a  later  date. 
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